Miscellaneous Items

Remove Duplicate Values

Remove Duplicate Values is primarily designed to remove duplicate in the spreadsheet. It can also be used to remove duplicate values in a single column or in a set of columns. When removing duplicate values, only the values in the selected range of cells or table are affected, values outside the range of cells or table are not altered or removed. When data is removed, either an entire row or a set of columns, the data below it is moved up to fill in the removed cells.

To remove duplicate values, use the Remove Duplicates command in the Data Tools group on the Data tab. When you remove duplicate values, you permanently delete duplicate values, rather than just hide them. A duplicate value is an exact match of a value in another row and are determined by the value displayed in the cell not the value stored in the cell ("3/8/2006" <> "Mar 8, 2006"). Because you are permanently deleting data, it is a best practice to copy the original range of cells or table to another worksheet or workbook before removing duplicate values.

  1. Select the range of cells from which you want to remove duplicates.

  2. Copy the range to another sheet. (optional, but a best practice)

  3. On the Data tab, in the Data Tools group, click Remove Duplicates. (see figure)

    Remove Duplicates
  4. Under Columns, select one or more columns. If you only selected one column just choose Select All to get the entire row of data. If your column or table has headers that you do not want to move, check the "my data has headers" box. Click OK.

  5. A message is displayed indicating how many duplicate values were removed and how many unique values remain, or if no duplicate values were removed. Click OK.

Known Issue: You cannot remove duplicate values from data that is outlined or that has subtotals. To remove duplicates, you must remove both the outline and the subtotals.

Rotate data from columns to rows or vice versa

Sometimes you have have a set of data in rows and columns but it doesn't display well, or is hard to visualize. Often, one way to improve the view of data is to transpose rows and columns. Transposing turns the rows into columns and the columns become the rows. Suppose you have some quarterly sales data of various products with the product names in the first column and the sales data in subsequent colmuns. It might be more understandable if the products names were the column names and the quarterly sales data is given in rows. Transposing the data performs this function.

Transposing, or rotating the data is simply a Copy and Paste operation. Firs select the range of data you want to rotate, copy it (not cut), then Paste it choosing the Transpose option.

Transpose

Issues: Make sure you do not overwrite data by choosing a place with enough room to paste your transposed data. Data in an Excel table cannot be transposed, first covert table to range or use TRANSPOSE function.

Working with Named Ranges

In the previous section, and in many of the subsequent sections we will be working with ranges. A range in Microsoft Excel is a group of cells that are identified by the top leftmost cell as the beginning of the range and the bottom rightmost cell as the end of the range, separated by a colon. For example, A3:D20 is a range starting with Column A and Row 3 and ending with Column D and Row 20. So this range comprises four Columns (A, B, C, and D) and 17 Rows (3 through and including 20).

The cell references that we just used are called "Relative References" or floating references. Relative cell references are useful because if you insert more rows or columns, or copy the contents of cell to another cell, the reference is updated relative to the new location. For example, if we were using the formula SUM(A2:A6) and needed to insert a row before row 2, the formula would automatically change to SUM(A3:A7). In other words, the position of the range, relative to the updated locate of the SUM function would remain the same.

If we want to reference a range, and keep it constant, then we would use Absolute References by putting a dollar sign in front of each identifier. In other words, A2:D20 would become $A$2:$D$20. This is an absolute range and its values do not update even when the formula is copied into other cells. The absolute reference can also be partial such as $A2:$D20, which means the columns (A, B, C, and D) are absolute and do not change where the rows (2 through 20) are relative and can change.

When working with functions it is often occurs that a defined range of values is referenced many times in different cells or various functions. In that case, it is frequently helpful to define and absolute range and to give that absolute range a name. That is a called a Named Range.

To define a Named Range:

  1. Use the Formulas Ribbon

  2. Select the Range of cells you want to be in the Range

  3. Click the Define Name menu

  4. Name the range of cells. Notice that it is an absolute range. Excel tries to define a default name from the column name or the top text value in the range. You can override that name.

From the SUM examples in the previous section, if we defined the range $A$2:$A$6 with the name of Amounts, then the formulast in Figure 1.2 would be:

  • SUM(Amounts)

  • SUMIF(Amounts, ">300000",B2:B6)

SMALL, LARGE

You should already be familiar with the functions MIN(range) and MAX(range). The arguments to MIN and MAX can be a list of cells, or a range or array. The functions SMALL(array,k) and LARGE(array,k) return the kth smallest or kth largest value in the array or range All of these functions are expecting numeric values. All of these functions ignore empty cells or cells that do not contain numeric values.

  • SMALL(array, k) -- Returns the k-th smallest value in a data set (a particular location in the data set).

  • LARGE(array, k) -- Returns the k-th largest value in a data set. (Example: Return the highest, runner-up, or third-place score).

Array -- An array or range of numerical data for which you want to determine the k-th smallest or largest value.

K -- The position (from the smallest or largest) in the array or range of data to return.

Figure 2.4: Example of SMALL function

In the example above, the function SMALL(B$3:b$11,D4) D4 contains the value 1, so it returns the smallest value. The cell in E5 will have the formual SMALL(B$4:B$11, D5), which will use the value 2 to find the second place value. The cell in E6 will find the 3rd place value from D6.

Comment: If n is the number of data points in array, SMALL(array,1) equals the smallest value, and SMALL(array,n) equals the largest value.

Comment: If n is the number of data points in a range, then LARGE(array,1) returns the largest value, and LARGE(array,n) returns the smallest value.